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We propose directed time series regression, a new approach to estimating parameters of time-series 
models for use in certainty equivalent model predictive control. The approach combines merits of least 
\Q . squares regression and empirical optimization. Through a computational study involving a stochastic 

version of a well known inverted pendulum balancing problem, we demonstrate that directed time series 
regression can generate significant improvements in controller performance over either of the aforemen- 
tioned alternatives. 
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Abstract 



1 Introduction 

A common approach to stochastic control, sometimes referred to as certainty equivalent model predictive 
control, involves at each time forecasting future outcomes of exogenous random variables and optimizing 



control actions over a p lanning horizon under the assumption that these forecasts will be realized (jBertsekas 



20051 ; IBox et all [2008). The first of the sequence of actions is taken. Forecasting and optimization are 



repeated at each subsequent time step. In this paper, we develop a regression algorithm that fits time series 
forecasting models for use in such a context. 

We focus attention on linear systems with quadratic cost, though the issues and methods we introduce 
generalize. In particular, we will consider a dynamic system that evolves over discrete time steps t = 
0, 1,2,.. .. At each time t, the state x t e K p is updated according to 

x t+ i = Ax t + Bu t + Cw t , 



where ut £ is a control action, wt € K s is a random disturbance, and A, B, and C are known matrices 
of appropriate dimension. Each control action Ut is selected just before the disturbance wt is observed. The 
objective is to minimize average expected cost 



lim E 

h— >QO 



^ h-1 



h 

t=0 



where the per period cost function is a positive definite quadratic of the form g{x,u) = x T Gix + x 1 G^u + 
u 1 G3U, for some G\, G2, and G3. The following example illustrates a specific context. 

Example 1 Consider a problem of balancing an inverted pendulum on a cart, as illustrated in Figure [7J 
Take the state to be a four- dimensional vector Xt — [sj 9 t St 0t] T , where St is the position of the cart, 
9t is the angle of the pendulum, and St and 9 t are their rates of change. The cart can move freely along 
the horizontal axis, and is controlled through applying a voltage u t to its moto r. We discre t ize t ime, and 



consider a linearization of the system dynamics around the balance point as in \Landru et all \200a ). which 
in the absence of disturbances can be written as x t+ \ = Ax t + But for some A and B . We introduce to the 
system a disturbance w t which can be thought of as a force exerted by a gust of wind at time t. With the 
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disturbances, the linearized system equation becomes Xt+i = Axt + Bu t + Cwt for some C . The objective is 
to center the cart and balance the pole while minimizing energy expenditure, and this is represented by a per 
period cost function g(xt, u t ) = xj G\Xt + G^uf, for some G\ and G3, where the first term captures how far 
the current state is from being centered and balanced and the second term reflects the energy applied by the 
control action. 




o 

Figure 1: Inverted pendulum on a cart. 



This example may seem simple and specialized, but our framework captures a broad set of applications. 
For example, we could model the control of a robot that receives instructions from a human user by encoding 
in xt both the physical state and a tentative plan. Then, wt could represent changes to the plan communicated 
by the user and u t could represent an action taken by the robot as it tries to follow the plan. 

An important issue is how future disturbances are forecasted. In the inverted pendulum problem, for 
example, accurate predictions of future wind patterns facilitate more effective control. We consider the use 
of linear time series models with coefficients estimated based on a historical sequence of disturbances. The 
main issue we focus on in this paper is how one should estimate coefficients. Note that the setting of linear 
systems with quadratic cost and linear time series model is one of the most commonly used in applied work 
and also offers a simple starting point for exploring how learning and control should be coordinated. 

Given a linear model, it is common to estimate coefficients by minimizing squared error over historical 
data and then use the model with the resulting coefficients to generate forecasts for model predictive control. 
However, as we will discuss further, least squares regression leaves room for improvement when the observed 
process is not itself generated by the specified time series model. In particular, in such situations it can be 
beneficial to take the control objective into account when computing coefficients. 

One approach that does this is empirical optimization, sometimes referred to as empirical risk mini- 
mization, which computes coefficients that would have minimized historical average cost given the historical 
sequence of disturbances. Under certain technical assumptions, if an infinite history of observations is avail- 
able, empirical optimization yields coefficients that minimize future average expected cost. However, with 
finite data, empirical optimization often works poorly because it overspecializes to the data. 

The main contribution of this paper is a new approach to fitting coefficients which we refer to as directed 
time series regression. This approach combines merits of least squares regression and empirical optimization. 
As we will demonstrate through a computational study of the inverted pendulum balancing problem, using 
directed time series regression can lead to large improvements in controller performance over either of the 
aforementioned alternatives. In relation to prior work, directed time series regression can be viewed as an 
extension of directed regression (|Kao et al.l . l2009f) to a context involving forecasting and control. This exten- 
sion builds on several new contributions relative to what is presented in their work. First, dynamic control 
problems are more complex than the static decision problem addressed in their work, since a state process 
drives decisions and can become unstable whereas there is no notion of evolving state in static problems. 
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Second, the empirical optimization problem considered in their work is a convex quadratic optimization 
problem, while the empirical optimization problem in our new context is not convex. We propose efficient 
approximati on algorithms to address this problem. Third, the cross-validation technique employed in the 
algorithm of iKao et al. (2009) does not apply in our new time series context, and we devise a sliding window 
cross-validation technique for this purpose. 



2 Certainty Equivalent Model Predictive Control 

Recall that the problem at hand involves controlling a system that evolves according to 

Xt+i = Ax t + Bu t + Cw t , 

with a goal of minimizing average expected cost, where the per period cost function is a positive definite 
quadratic of the form g(x, u) — x T G\x + x T G2U + u T G3U. 

Although the time horizon of interest is infinite, to simplify the problem, at each time t, model predictive 
control (MPC) aims to optimize a finite horizon objective of the form 



E, 



y=t+M 
^2 g{x T ,u T ) 

T = t 



where M is the horizon time and the subscript t indicates that the expectation is conditioned on x t , Wt-i, Wt-2, wt-3, 
After deriving a control policy that optimizes this objective, MPC applies the policy at time t to generate 
a control action it t . Then, after observing wt and x t +\, a new finite horizon problem is formulated with a 
horizon spanning times t + 1 through t + 1 + M. This new problem is solved to produce a control action 
u t +\. The process is repeated at each subsequent time step. 

In order to compute expected future costs, MPC requires a model that provides a distribution over 
future disturbance sequences. Certainty equivalent MPC simplifies the problem by assuming that future 
disturbances follow a deterministic sequence wt,t, ^t,t+i> ^i,t+2> • • • generated at time i by a forecasting 
model. Hence, the objective of the optimization problem solved at time t becomes 

T = t + M 

^2 g(Xr,U T ), (1) 

T = t 

where the state dynamics are governed by the same linear difference equation as before but with disturbances 
given by (w t ,Wt + i,w t+ 2, ■ • ■) = («>t,t, t&t.t+i, Wt,t+2, ■ • ■)■ 

For our context, which involves a linear system and a quadratic cost function, certainty equivalent MPC 
generates a control policy that is linear in state and forecasts. To simplify our discussion we will assume that 
disturbances are scalar-valued, though the ideas and methods we will present extend to the vector-valued 
case. Under this assumption, there are matrices L and H, which depend only on A, B, C, G±, G2, G3, and 
M, such that at each time t, the action employed by certainty equivalent MPC is given by 

u t = Lx t + Hwl : l +M -\ (2) 

where w*'j +M_1 = [w ty t • • • w>t,i+M-i] T - In particular, this action minimizes the objective in (JTJ) - 

Certainty equivalent MPC leads to effective decisions for a wide variety of control problems. For example, 
in our context of linear-quadratic control, if the process that generates disturbances is known and ergodic 
and each forecast w ty t+r is equal to the expected disturbance E t [wt+r] then the performance of certainty 
equivalent MPC becomes arbitrarily close to optimal as M grows. Certainty equivalent MPC has also been 
applied with success to a number of important nonlinear systems. 



3 Forecasting Model 



We consider the use of a model driven by K linear features 

T 

Vk,t = / ] (j>k,TWt-r, k = l,...,K. 

r=l 
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Here, each <fik,T is a scalar constant and each Vk,t represents a scalar feature that is useful for predicting the 
next disturbance. We consider a model of the form 

K 

w t = ^V fc u M + z u 
fe=l 

where r £ M. K is a vector of model coefficients and z t ~ A/"(0, <r1) is i.i.d Gaussian noise. Note that the 
conditional expectation of disturbance w t at time t is linear in past disturbances. 

It is straightforward to generate forecasts t&t,t, u)t,t+i> • • • u't.t+Af-i given the coefficients r, features 
(j>l, . . .,<j>K, and past disturbances Wt-i, . . . , Wt-T- In particular, let u)t^ = u>^ for t < t, and for £ = 
t, . . . ,t + M — I, recursively compute 

T 

Vk,t,t = /,<Pk,TWt,e-T, k = l,...,K, 

r=l 

and 

In the event that the data is generated by our model, each forecast produced in this way is a conditional 
expectation: w^t+r — Et[iU{+ T ]. 

Since, conditioned on wt-i, ■ ■ ■ ,wt-T, the forecasts wt,t, ■ ■ ■ ,Wt,t+M-l are conditionally independent of 
previous disturbances, there is a function F r which maps the T most recent disturbances to forecasts of the 
next M disturbances. In particular, 

~t,t+M-l t-, / t-l \ 

where w\Zx — \wt-T '" '^t-i] T - Note that F r depends on the coefficient vector r and is generally not 
linear in r. From (J2|) we see that control actions generated by certainty equivalent MPC based on forecasts 
produced by our model can be written as 

u t = Lxt + H F r (w\z)r) ■ 

Let jj, r denote a control policy that maps xt and wlz x t° u t ■ Note that this control policy depends on r and 
again is generally not linear in r. 



4 Time Series Regression Algorithms 

We now discuss algorithms for computing coefficients to fit a forecasting model of the kind we have described 
to a time series. In particular, each algorithm will compute a vector r £ R K given observed samples 

4.1 Least Squares 

The most common approach is least squares regression (LS). Here, we compute coefficients 



N-l 



,.LS 



argmm 



t=T 



K 

E 

k=l 



rkVk,t 



Note that the sum begins with t = T because earlier samples wo, . . . ,wt-i are required to compute the 
features Vk.T- The optimization problem can be solved efficiently since it is a linear least squares problem. 
The resulting coefficient vector r LS represents the maximum likelihood estimate assuming that the observed 
time series is generated by our forecasting model. As such, this is a natural approach to fitting a forecasting 
model that accurately captures the generating process. 
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4.2 Empirical Optimization 

LS does not take the control problem into account when computing coefficients. This assumes a separation 
principle whereby model fitting can be decoupled from the subsequent selection of control actions. Empir- 
ical optimization (EO) provides an alternative that does account for the control problem when computing 
coefficients. The algorithm computes coefficients that minimize "historical cost" : 

N—i 

r EO := argmin g(x% r , Hr(x? r , w\Zt)), (3) 

T t=T 

where xtp is set to the initial state of the system and subsequent control actions ut, ut+i, ■ ■ • , un-i are 
selected by /_t r . The objective here is the sum of costs that would have been realized if we were to apply 
certainty equivalent MPC alongside our forecast model with coefficients r, starting at time T. It is easy 
to see that if the disturbance process is ergodic, as the number TV of observed samples increases, historical 
average cost converges to future average cost, and therefore, EO computes coefficients that optimize future 
average cost. However, for reasonable values of N, EO tends to overspecialize to the data, and this results 
in poor future performance. 

It may appear difficult to compute r because [i r is generally not linear in r. However, as in the case of 
the computational study we will later discuss, we have had positive experience applying local optimization 
methods initialized with r LS . A more detailed description of our implementation is given in the appendix. 

Nevertheless, this non-convex optimization problem can still be challenging as the number of parameters 
K increases, mainly because evaluating the gradients and Hessians of F r requires iterative computation. We 
now propose an approximation algorithm that relieve this problem by linearization. Note that /_t r is typically 
close to linear in a region of M. that includes reasonable choices of r, for reasons we now explain. The 
policy fi r can be written as a sum of terms that are linear and terms that are nonlinear in 7'. Linear terms 
are multiples of terms of the form r^tp^^. Nonlinear terms are higher order terms that involve products of 
terms of the form r^ij)^ T . In most problems of practical interest, terms of the form T are significantly 
smaller than one because past disturbances exhibit decaying influence on future ones. As such, the nonlinear 
terms tend to be significantly smaller than the linear ones, which therefore play a dominant role in /_i r . This 
motivates the following approximation algorithm. 

Suppose we have a reasonable estimate f for the model coefficients, which we will refer to as base 
coefficients. In practice, a typical choice for it is r LS . Given f, it is straightforward to iteratively generate 
forecasts w t ,t,w t ,t+i, ■ ■ ■ ,w t ,t+M-i and features Vk,t,t,Vk,t,t+i, ■ ■ ■ ,i>k,t,t+M-i, k = 1, • • • ,K. Now fix these 
features and define 

K 

w t ,i := r k vk,t,e, i = t, . . . , t + M - 1. 
fc=i 

In other words, here we fix base coefficients f when rolling forward but allow another set of coefficients r to 
take effect at the last step of prediction. Apparently w t j, is linear in r. Furthermore, by similar notation we 
define 

and 

l&f, r (x t ,K;*5^) := Lx t + HF? , r (W(I-), 

both of which are also (affinc) linear in r. As we can see, this approximation trades in some model flexibility 
for linearity. This leads to an alternative formulation for EO, which we refer to as linearized EO (LEO): 

JV-l 

r LEO := argmin V g(a$* '", Af.rOf (4) 

r t=T 

Unlike ([3|), (j4|) can be solved efficiently by quadratic programming. A policy function /j^leo can then be 
built upon r LE °. 
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4.3 Directed Time Series Regression 



Directed time series r egression (DTSR 



was first introduced in Kao et al. (200 



R) 
)9) 



aims to combine the merits of LS and EO. Directed regression 
in a context involving repeated independent decision problems as 
opposed to intertemporal control. That work also developed a theory that motivated the algorithm and 
demonstrated its benefits through a computational study. What we consider here is an extension of directed 
regression that is suitable for time series. 

A naive version of DTSR (NDR) produces a vector of coefficients that is a convex combination of those 
computed by LS and EO: 

r NDR. = (1 _ A)r LS +Ar E 0) 

where A £ [0, 1] is selected by cross-validation. This algorithm, albeit simple and intuitive, requires solving 
EO as an intermediate step and therefore can be inefficient for complicated problems. This disadvantage 
motivates the following approximation algorithm, linearized DTSR (LDR), which produces a coefficient 
vector by taking a convex combination of r LS and r LEO 



„LDR 



(f - A)r Lb + Xr 



„LEO 



A policy function then follows. 

We will use a sliding window cross-validation procedure to select the parameter A £ [0, 1] for NDR and 
LDR. The windows are defined by a sequence U £ {T+l, T+2, . . . , N— I }, each element specifying a boundary 
that separates the ith training set window from the ith validation set window. To select the parameter A 
for NDR, for each i we compute r LS and r EO using the observations Wo, w\, . . . , wt t , and then evaluate each 
choice of Ai by simulating the system with control actions generated by /i( 1 _>) r Ls + ^ r Eo , starting at state 
x ti = 0. The performance of each \ is judged based on the sum of costs incurred over time ti + 1, . . . , N — 1. 
For each i, the best value of Xi is selected, and we take A to be the average over i among the selected values of 
Ai. In our computational study, we set h, t 2 , and t 3 to T + 0.3(N - T), T + 0.5(N - T), and T + 0.7 (N - T), 
each rounded off to the closest integer. The cross-validation procedure for LDR is essentially the same, 
except the replacement of r EO by r LEO and /*t(i-A)r LS +Ar EO by fif 1 (i_A)r LS +Ar LEO • Figure [2] illustrates this 
sliding window cross-validation procedure. 
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Figure 2: Sliding window cross-validation. 



5 Computational Study 

To assess relative merits of the algorithms we have discussed, we conducted a computational study involving 
an inverted pendulum problem of the kind described in Example 1. 



G 



5.1 System Dynamics 



We discretize time, sampling at a r ate of 100Hz. We employ a linearization of this system around the balance 
point, as derived in lLandrv et al.l ([20051 ). We also use the same physical parameters therein, and we assume 
that at each time t there is a gust of wind accelerating the pendulum's angle by w t . Specifically, the system 
dynamics are characterized by the following matrices: 



.4 



" 1 





0.01 








1 





0.01 





-0.0178 


0.8872 
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, c = 









0.0198 







-0.04871 




0.01 



B = 



Since our goal is to keep the cart centered and the pendulum vertical while minimizing energy consumption, 
we use a cost function parameterized by the following matrices: 



Gi = 



1000 














1000 














1 














1 



,G 2 = 0,G 3 = 0.1. 



Hence, the cost in period t is xjGiXt + O.luj, where the state is x t = [s t 
Ut influences acceleration. Along with the matrices A, B, C, Gx, G2, and G3, 
experiment and evaluate the policy matrices L and H accordingly. 



9 t St 9 1] and the action 
we take M to be 100 in our 



5.2 Generative and Forecasting Models 

In our study, we randomly sample an ensemble of generative models, each of which is used to generate a 
time series to which our algorithms are applied. Each generative model is sampled as follows: 

1. Sample ipi,tp2, ■•■,'05 i-i-d from A/"(0, 1). 

2. Sample ipe, 1/17, . . . , i.i.d from Af(0, j^i)- 

3. Select (3 £ R+ such that the process 

30 

Wt = l3} j IprWt-r + z t 

T=l 

has a variance E[tw|] = 2erf . 

4. Select u 2 z such that when control actions are selected according to Ut = Lxt, the average expected cost 
is equal to one. 

These steps result in an AR(30) process for which the first five coefficients tend to be an order of magnitude 
larger than the next twenty-five. The last two steps in the sampling process deserve some explanation. Step 
3 normalizes the signal-to-noise ratio of the disturbance process. This reduces variations among sampled 
generative models in how helpful a forecasting model can be. Step 4 serves to normalizes the control objective 
which would otherwise dramatically differ from one generative model to another. The control policy u t — Lx t 
is one that is optimal when future disturbances are not forecastable. The disturbance variance a\ is chosen 
so that the average expected cost when the system is controlled by this naive policy is equal to one. 

Since the five most recent disturbance dominate others in influencing future disturbances, it is natural 
to select them as features of a forecasting model. In particular, we consider a forecasting model with five 
features: 

Vk,t = w t -k, k — 1, . . . , 5. 
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This model approximates the generative model but at the same time neglects useful information that can 
be extracted from disturbances observed beyond five time periods in the past. It is our intention to consider 
a forecasting model that does not perfectly capture the generative model since it is in such contexts that 
DTSR adds value. It is also important to recognize that this sort of model misspecification is inevitable in 
practical applications. 



5.3 Results 

For our computational study, we sampled 2,000 generative models using the above procedure. For each 
model, a sequence of 5,360 disturbances w' , . . . , w' 5359 are generated, initialized with an assumption that all 
disturbances prior to w' are equal to zero. We take the last ./V samples of this trajectory as observations to 
be used by regression algorithms. In other words, each algorithm makes use of disturbances w t = w'^Q-N+t 
for t = 0, 1, . . . , N — 1. In our experiment, we repeat the LS, EO, NDR, LEO, and LDR algorithms for each 
N e {200,240, ...,360}. 

To provide a lower bound, we consider a "model-clairvoyant (MC) policy." This is the optimal policy 
to use given full knowledge of the generative model. The average expected cost g* incurred by this policy, 
which we will refer to as the MC cost, can be derived in closed form. Clearly, policies generated by the 
aforementioned five algorithms must incur average expected costs no less than g* . 

We can also evaluate in closed form the average expected costs g LS , g EO , g NDR , g LEO , and g LDR , of 
policies generated by our regression algorithms. We will focus our presentation on excess costs g — g*, 
5 EO - g*, ff NDR - 9*, 3 LEO - g*, and 5 LDR - g*. Subtracting g* factors out the influence of costs that 
are inevitable regardless of the choice of control policy. Figure EH plots e xcess costs as a function of N. 



These results are qualitatively similar to those reported in Kao et al. (2009) in a context involving repeated 



independent decision problems rather than intertemporal control. For small N, EO and LEO suffer from 
overspecialization. For large N, the degree of overspecialization diminishes and the bias in LS due to model 
mispecification prevents LS from doing as well as EO and LEO. LDR always outperforms the best of these 
three methods, sometimes by as much as 12%. It is also interesting to note that LDR provides a consistent 
gain over NDR, in spite of the substantial saving in computation. 

We can also compare quantities with simple physical interpretations. Suppose this cart-pendulum system 
is considered to reach a state of failure if the cart's position or the pendulum's angle deviates from zero 
by more than certain threshold values. Let the thresholds on position and angle be B s = 0.0392 and 
Bg = 0.0364, which are roughly twice of the root-mean-square values of St and 9t in our simulations. We 
refer to the number of time periods elapsed before a system initialized at the zero state reaches failure as 
time-until-failure. For each of our 2,000 models, we generate 50 disturbance trajectories. For each of these 
trajectories and each of the policies under consideration (MC, LS, EO, NDR, LEO, and LDR), we simulated 
the system until failure. Figure [4] summarizes results. It is clear that NDR and LDR are much closer to MC 
than LS, EO, and LEO are in terms of time-until-failure. 



6 Conclusion 

In this paper we have presented directed time-series regression, an algorithm that computes coefficients of 
a time-series model for use in certainty equivalent model predictive control. Two versions of this algorithm 
have been proposed, namely NDR and LDR, which we will collectively refer to as DTSR in the following 
discussion. We have shown that when the quantity of available data is limited and the forecasting model 
differs from the generative model, this algorithm offers a significant advantage by combining merits of least 
squares regression and empirical optimization. As side products, we have also proposed a methodology that 
transforms the original problem into a linearized version, and a sliding window cross-validation algorithm 
for time series with control. Both techniques can potentially enlarge the applicability of directed regression 
theory in other fields. 

The main point of the paper is that the coordination of forecasting and control can be fruitful and we 
hope this observation will stimulate work along these lines involving broader classes of control problems and 
learning algorithms. Our approach extends to more complex settings than the one considered in this paper, 
for example, cases where disturbances are multidimensional and the system is nonlinear. It would also be 
interesting to explore the use of DTSR in contexts involving non-stationary time series. 
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Figure 3: The excess costs delivered by LS, EO, NDR, LEO, and LDR for different numbers of training 
samples N; the curves of EO and LEO are essentially the same. Note that LDR is the dominant solution in 
all cases, reducing the excess costs delivered by LS/EO by as much as 12%. 



Let us conclude with a discussion of several threads of work that relate to the ideas we have presented. 
Motivated by a similar perspe c tive o n fitting a mispecified model for use by a control algorithm, the approach 
developed in lAbbeel and Na (|2004[ ) learns a first-order Markov model in a way that improves controller 
performance when data is not generated by a first-order Markov model. This approach takes the discount 
factor of the control problem into account when learning the transition matrix. Aside from the contextual 
differences between discrete Markov processes and linear autoregressive models, a conceptual advance in our 
work can be seen in the fact that DTSR takes the entire control objective into account. 

Differences between LS and EO are analogous to differences that hav e been studied between generative 
and discriminative methods for learning (see, e.g., Ng and Jordan! ( 200l[ )V When data samples are scarce, 
generative methods often provide better results, as does LS. On the other hand, when there is ample data, 
discriminative methods are advantageous, as is the case for EO. DTSR provides a useful way of combining 
the merits of LS and EO, and the idea may generalize to offer an approach that can more broadly be used 
to combine merits of generative and discriminative methods. 

Control theorists working on sy stem identification typically adopt weighted least-square linear regression 
to fit a linear system (jLiungl . 119981 ) . While this approach puts more emphasis on learning the critical com- 
ponents, it does not explicitly consider the control obj ective. Econometricians have analyzed p roperties of 
parameter estimates for misspecified time series models (|Whitel . [l98nL iDomowitz and Whitd . lT982| ). However, 
this line of work does not treat the use of such models and estimated parameters for decision or control. Op- 
erations researchers have developed methods that factor objecti ves into how point estimates such as forecasts 
are generated when they are to be used for decision or control ( Granger . 1969t Livanage and Shanthikumar . 
120051 : Ichua et all [20081 . However, this line of work does not consider model misspecification. 



Appendix 

Implementation Details for EO 

The particular lo cal optimization method we use in our computational study is a variation of the Guass- 
Newton method ( BertsekasL 19991 ). This suits our problem well because g is quadratic. The particular 
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Figure 4: The average time-until-failure delivered by MC, LS, EO, NDR, LEO, and LDR, when a training 
data set of size 240 is given. Note that NDR and LDR are much closer to MC than LS, EO, and LEO are. 



variation we use deviates from the standard Gauss-Newton method in that each iteration carries out a 
backtracking line search to determine the size of the step taken toward what would be the next Gauss- 
Newton iterate. Further, whenever the approximate Hessian is not positive definite, we add to it a small 
multiple of the identity matrix, in the spirit of the Levenberg-Marquardt method. In our computational 
study, the backtracking line search starts with a step size of one and halves the step size repeatedly so long 
as that improves the objective value in (j3J). When a multiple of the identity matrix is added, which is rare, 
the multiplier is 0.001. 
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